Experiment: ResumableParser#994
Conversation
497df78 to
6162ba8
Compare
Extracted from: #994 Modern compilers shouldn't have problem computing `strlen` at compile time and generating the same code.
Extracted from: ruby/json#994 Modern compilers shouldn't have problem computing `strlen` at compile time and generating the same code. ruby/json@b07f74bd73
| if (*handle) { | ||
| RB_OBJ_WRITTEN(*handle, Qundef, value); | ||
| } |
There was a problem hiding this comment.
This seem to account for most of the perf regression on twitter.json.
I think instead of making rvalue_stack WB protected, we could just not embed it, or just have a secondary non-protected object just to mark it.
|
Great! Here are some random notes: MessagePack like API (splitting a parsing API to appending new data, parsing buffer and getting parsed data) is good. MessagePack also uses
How about diff --git a/ext/json/ext/parser/parser.c b/ext/json/ext/parser/parser.c
index 749d594..f0731ac 100644
--- a/ext/json/ext/parser/parser.c
+++ b/ext/json/ext/parser/parser.c
@@ -2254,9 +2254,8 @@ static VALUE cResumableParser_parse(VALUE self)
static VALUE cResumableParser_value(VALUE self)
{
JSON_ResumableParser *parser = cResumableParser_get(self);
- json_frame *frame = json_frame_stack_peek(&parser->frames);
- if (frame->phase == JSON_PHASE_DONE) {
+ if (parser->state.value_stack->head > 0) {
return *rvalue_stack_peek(parser->state.value_stack, 1);
} else {
rb_raise(rb_eArgError, "no ready value"); // TODO: Figure out the best exception and message@tompng shared an use case of resumable parser:
For example:
The above diff doesn't satisfy this use case but we can use it for simpler case such as If we also provide an API that returns not processed data something like the following, we may be able to cover the generative AI API response use case: diff --git a/ext/json/ext/parser/parser.c b/ext/json/ext/parser/parser.c
index 749d594..d39ba11 100644
--- a/ext/json/ext/parser/parser.c
+++ b/ext/json/ext/parser/parser.c
@@ -2263,6 +2263,15 @@ static VALUE cResumableParser_value(VALUE self)
}
}
+static VALUE cResumableParser_rest(VALUE self)
+{
+ JSON_ResumableParser *parser = cResumableParser_get(self);
+
+ return rb_str_substr(parser->buffer,
+ parser->state.cursor - parser->state.start,
+ parser->state.end - parser->state.cursor);
+}
+
void Init_parser(void)
{
#ifdef HAVE_RB_EXT_RACTOR_SAFE
@@ -2289,6 +2298,7 @@ void Init_parser(void)
rb_define_method(cResumableParser, "feed", cResumableParser_feed, 1);
rb_define_method(cResumableParser, "parse", cResumableParser_parse, 0);
rb_define_method(cResumableParser, "value", cResumableParser_value, 0);
+ rb_define_method(cResumableParser, "rest", cResumableParser_rest, 0);
CNaN = rb_const_get(mJSON, rb_intern("NaN"));
rb_gc_register_mark_object(CNaN); |
Agreed. As I was working on this, I was thinking of simply renaming it to
So your snippet wouldn't work, because the Hash and Array are only built once complete. e.g. However we could indeed build the partial object on demand. It's an interesting feature, I'll see about adding it, but I think it should either be a different method or an optional paramter, e.g.
Interesting, wouldn't be hard indeed. |
Fix: ruby#983 Numerous known issues TODO: - `object_start_cursor` recorded in frame becomes invalid if the buffer string is reallocated or spilled. - The buffer need to be shrunk sometimes. - Lot more testing needed. - Unclear what to do with top level numbers (and perhaps true/false/null) - API is all but final - I'd like to be able to "pop" the value, so we don't uselessly keep a reference on it. - Then methods need to be documented. - It would worth trying to make `json_parse_any` exception free. - Right now EOF errors have been eliminated in favor of returning `false`. - We could try to do the same with syntax errors. - But then we need to `rb_protect` when calling back into Ruby or other unsafe APIs, so perhaps it's best to just accept it.
Fix: #983
Numerous known issues TODO:
JSON.parse. Right nowtwitter.jsonis7-10%slower, that's not OK. We might need to duplicate the parsing loop if necessary.rvalue_stack_pushbrough most of the performance back.cResumableParserbut not the embeddedrvalue_stack.object_start_cursorrecorded in frame becomes invalid if the buffer string is reallocated or spilled.json_parse_anyexception free.false.rb_protectwhen calling back into Ruby or other unsafe APIs, so perhaps it's best to just accept it.